---
title: EcoAssistant - Using LLM Assistants More Accurately and Affordably
authors: jieyuz2
tags: [LLM, RAG, cost-effectiveness]
---

![system](img/system.png)

**TL;DR:**
* Introducing the **EcoAssistant**, which is designed to solve user queries more accurately and affordably.
* We show how to let the LLM assistant agent leverage external API to solve user query.
* We show how to reduce the cost of using GPT models via **Assistant Hierarchy**.
* We show how to leverage the idea of Retrieval-augmented Generation (RAG) to improve the success rate via **Solution Demonstration**.


## EcoAssistant

In this blog, we introduce the **EcoAssistant**, a system built upon AutoGen with the goal of solving user queries more accurately and affordably.

### Problem setup

Recently, users have been using conversational LLMs such as ChatGPT for various queries.
Reports indicate that 23% of ChatGPT user queries are for knowledge extraction purposes.
Many of these queries require knowledge that is external to the information stored within any pre-trained large language models (LLMs).
These tasks can only be completed by generating code to fetch necessary information via external APIs that contain the requested information.
In the table below, we show three types of user queries that we aim to address in this work.

| Dataset | API | Example query |
|-------------|----------|----------|
| Places| [Google Places](https://developers.google.com/maps/documentation/places/web-service/overview) | I’m looking for a 24-hour pharmacy in Montreal, can you find one for me? |
| Weather | [Weather API](https://www.weatherapi.com) | What is the current cloud coverage in Mumbai, India? |
| Stock | [Alpha Vantage Stock API](https://www.alphavantage.co/documentation/) | Can you give me the opening price of Microsoft for the month of January 2023? |


### Leveraging external APIs

To address these queries, we first build a **two-agent system** based on AutoGen,
where the first agent is a **LLM assistant agent** (`AssistantAgent` in AutoGen) that is responsible for proposing and refining the code and
the second agent is a **code executor agent** (`UserProxyAgent` in AutoGen) that would extract the generated code and execute it, forwarding the output back to the LLM assistant agent.
A visualization of the two-agent system is shown below.

![chat](img/chat.png)

To instruct the assistant agent to leverage external APIs, we only need to add the API name/key dictionary at the beginning of the initial message.
The template is shown below, where the red part is the information of APIs and black part is user query.

![template](img/template.png)

Importantly, we don't want to reveal our real API key to the assistant agent for safety concerns.
Therefore, we use a **fake API key** to replace the real API key in the initial message.
In particular, we generate a random token (e.g., `181dbb37`) for each API key and replace the real API key with the token in the initial message.
Then, when the code executor execute the code, the fake API key would be automatically replaced by the real API key.


### Solution Demonstration
In most practical scenarios, queries from users would appear sequentially over time.
Our **EcoAssistant** leverages past success to help the LLM assistants address future queries via **Solution Demonstration**.
Specifically, whenever a query is deemed successfully resolved by user feedback, we capture and store the query and the final generated code snippet.
These query-code pairs are saved in a specialized vector database. When new queries appear, **EcoAssistant** retrieves the most similar query from the database, which is then appended with the associated code to the initial prompt for the new query, serving as a demonstration.
The new template of initial message is shown below, where the blue part corresponds to the solution demonstration.

![template](img/template-demo.png)

We found that this utilization of past successful query-code pairs improves the query resolution process with fewer iterations and enhances the system's performance.


### Assistant Hierarchy
LLMs usually have different prices and performance, for example, GPT-3.5-turbo is much cheaper than GPT-4 but also less accurate.
Thus, we propose the **Assistant Hierarchy** to reduce the cost of using LLMs.
The core idea is that we use the cheaper LLMs first and only use the more expensive LLMs when necessary.
By this way, we are able to reduce the reliance on expensive LLMs and thus reduce the cost.
In particular, given multiple LLMs, we initiate one assistant agent for each and start the conversation with the most cost-effective LLM assistant.
If the conversation between the current LLM assistant and the code executor concludes without successfully resolving the query, **EcoAssistant** would then restart the conversation with the next more expensive LLM assistant in the hierarchy.
We found that this strategy significantly reduces costs while still effectively addressing queries.

### A Synergistic Effect
We found that the **Assistant Hierarchy** and **Solution Demonstration** of **EcoAssistant** have a synergistic effect.
Because the query-code database is shared by all LLM assistants, even without specialized design,
the solution from more powerful LLM assistant (e.g., GPT-4) could be later retrieved to guide weaker LLM assistant (e.g., GPT-3.5-turbo).
Such a synergistic effect further improves the performance and reduces the cost of **EcoAssistant**.

### Experimental Results

We evaluate **EcoAssistant** on three datasets: Places, Weather, and Stock. When comparing it with a single GPT-4 assistant, we found that **EcoAssistant** achieves a higher success rate with a lower cost as shown in the figure below.
For more details about the experimental results and other experiments, please refer to our [paper](https://arxiv.org/abs/2310.03046).

![exp](img/results.png)

## Further reading

Please refer to our [paper](https://arxiv.org/abs/2310.03046) and [codebase](https://github.com/JieyuZ2/EcoAssistant) for more details about **EcoAssistant**.

If you find this blog useful, please consider citing:

```bibtex
@article{zhang2023ecoassistant,
  title={EcoAssistant: Using LLM Assistant More Affordably and Accurately},
  author={Zhang, Jieyu and Krishna, Ranjay and Awadallah, Ahmed H and Wang, Chi},
  journal={arXiv preprint arXiv:2310.03046},
  year={2023}
}
```
